Shallow Semantically-Informed PBSMT and HPBSMT
نویسندگان
چکیده
This paper describes shallow semantically-informed Hierarchical Phrase-based SMT (HPBSMT) and Phrase-Based SMT (PBSMT) systems developed at Dublin City University for participation in the translation task between EN-ES and ES-EN at the Workshop on Statistical Machine Translation (WMT 13). The system uses PBSMT and HPBSMT decoders with multiple LMs, but will run only one decoding path decided before starting translation. Therefore the paper does not present a multi-engine system combination. We investigate three types of shallow semantics: (i) Quality Estimation (QE) score, (ii) genre ID, and (iii) context ID derived from context-dependent language models. Our results show that the improvement is 0.8 points absolute (BLEU) for EN-ES and 0.7 points for ES-EN compared to the standard PBSMT system (single best system). It is important to note that we developed this method when the standard (confusion network-based) system combination is ineffective such as in the case when the input is only two.
منابع مشابه
Pre-ordering of phrase-based machine translation input in translation workflow
Word reordering is a difficult task for decoders when the languages involved have a significant difference in syntax. Phrase-based statistical machine translation (PBSMT), preferred in commercial settings due to its maturity, is particularly prone to errors in long range reordering. Source sentence pre-ordering, as a pre-processing step before PBSMT, proved to be an efficient solution that can ...
متن کاملIntegrating Rules and Dictionaries from Shallow-Transfer Machine Translation into Phrase-Based Statistical Machine Translation
We describe a hybridisation strategy whose objective is to integrate linguistic resources from shallow-transfer rule-based machine translation (RBMT) into phrase-based statistical machine translation (PBSMT). It basically consists of enriching the phrase table of a PBSMT system with bilingual phrase pairs matching transfer rules and dictionary entries from a shallow-transfer RBMT system. This n...
متن کاملUOW: Semantically Informed Text Similarity
The UOW submissions to the Semantic Textual Similarity task at SemEval-2012 use a supervised machine learning algorithm along with features based on lexical, syntactic and semantic similarity metrics to predict the semantic equivalence between a pair of sentences. The lexical metrics are based on wordoverlap. A shallow syntactic metric is based on the overlap of base-phrase labels. The semantic...
متن کاملPOSTECH's Statistical Machine Translation Systems for NTCIR-9 PatentMT Task (English-to-Japanese)
Dependency label LA advcl (adverbial clause modifier) LO (default) LR (default) LM aux (auxiliary), auxpass (passive aux), neg (negation modifier) , cop (copular) RM prt (phrasal verb particle) RO conj (conjuction), cc (coordination), punt (punctuation) Run ID E’ Generation E-E’ E’-J BLEU NIST RIBES KLE-01 Transfer PBSMT Hiero 0.3404 8.2467 0.690476 KLE-02 Transfer PBSMT PBSMT 0.2982 7.8411 0.6...
متن کاملLearning Machine Translation from In-domain and Out-of-domain Data
The performance of Phrase-Based Statistical Machine Translation (PBSMT) systems mostly depends on training data. Many papers have investigated how to create new resources in order to increase the size of the training corpus in an attempt to improve PBSMT performance. In this work, we analyse and characterize the way in which the in-domain and outof-domain performance of PBSMT is impacted when t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013